102 research outputs found
A Universal Part-of-Speech Tagset
To facilitate future research in unsupervised induction of syntactic
structure and to standardize best-practices, we propose a tagset that consists
of twelve universal part-of-speech categories. In addition to the tagset, we
develop a mapping from 25 different treebank tagsets to this universal set. As
a result, when combined with the original treebank data, this universal tagset
and mapping produce a dataset consisting of common parts-of-speech for 22
different languages. We highlight the use of this resource via two experiments,
including one that reports competitive accuracies for unsupervised grammar
induction without gold standard part-of-speech tags
Structured Training for Neural Network Transition-Based Parsing
We present structured perceptron training for neural network transition-based
dependency parsing. We learn the neural network representation using a gold
corpus augmented by a large number of automatically parsed sentences. Given
this fixed network representation, we learn a final layer using the structured
perceptron with beam-search decoding. On the Penn Treebank, our parser reaches
94.26% unlabeled and 92.41% labeled attachment accuracy, which to our knowledge
is the best accuracy on Stanford Dependencies to date. We also provide in-depth
ablative analysis to determine which aspects of our model provide the largest
gains in accuracy
Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging
We consider the construction of part-of-speech taggers for resource-poor languages. Recently, manually constructed tag dictionaries from Wiktionary and dictionaries projected via bitext have been used as type constraints to overcome the scarcity of annotated data in this setting. In this paper, we show that additional token constraints can be projected from a resource-rich source language to a resource-poor target language via word-aligned bitext. We present several models to this end; in particular a partially observed conditional random ïŹeld model, where coupled token and type constraints provide a partial signal for training. Averaged across eight previously studied Indo-European languages, our model achieves a 25% relative error reduction over the prior state of the art. We further present successful results on seven additional languages from different families, empirically demonstrating the applicability of coupled token and type constraints across a diverse set of languages
Temporal Analysis of Language through Neural Language Models
We provide a method for automatically detecting change in language across
time through a chronologically trained neural language model. We train the
model on the Google Books Ngram corpus to obtain word vector representations
specific to each year, and identify words that have changed significantly from
1900 to 2009. The model identifies words such as "cell" and "gay" as having
changed during that time period. The model simultaneously identifies the
specific years during which such words underwent change
Natural Language Processing with Small Feed-Forward Networks
We show that small and shallow feed-forward neural networks can achieve near
state-of-the-art results on a range of unstructured and structured language
processing tasks while being considerably cheaper in memory and computational
requirements than deep recurrent models. Motivated by resource-constrained
environments like mobile phones, we showcase simple techniques for obtaining
such small neural network models, and investigate different tradeoffs when
deciding how to allocate a small memory budget.Comment: EMNLP 2017 short pape
Universal Semantic Parsing
Universal Dependencies (UD) offer a uniform cross-lingual syntactic
representation, with the aim of advancing multilingual applications. Recent
work shows that semantic parsing can be accomplished by transforming syntactic
dependencies to logical forms. However, this work is limited to English, and
cannot process dependency graphs, which allow handling complex phenomena such
as control. In this work, we introduce UDepLambda, a semantic interface for UD,
which maps natural language to logical forms in an almost language-independent
fashion and can process dependency graphs. We perform experiments on question
answering against Freebase and provide German and Spanish translations of the
WebQuestions and GraphQuestions datasets to facilitate multilingual evaluation.
Results show that UDepLambda outperforms strong baselines across languages and
datasets. For English, it achieves a 4.9 F1 point improvement over the
state-of-the-art on GraphQuestions. Our code and data can be downloaded at
https://github.com/sivareddyg/udeplambda.Comment: EMNLP 201
- âŠ